Data Wrangling Project - WeRateDogs Twitter Archive - Report

written report called act_report.pdf or act_report.html on the insights and visualisation(s) made from the wrangled data (250+ words)

[NbConvertApp] Converting notebook project3-act_report.ipynb to html
[NbConvertApp] Writing 4215976 bytes to project3-act_report.html

The original prediction data had 3 guesses at dog type and dog/not dog for each tweet's image, in decreasing order of confidence. From these 3 predictions (dog/not dog, dog/item type), pick the most confident guess that is a dog type (confidence of p1 > p2 > p3). If none of the 3 predictions are dogs, then use the first prediction.

Out[4]:
rating_numerator rating_denominator rating_score text
162 420 10 42.0 @dhmontgomery We also gave snoop dogg a 420/10...
163 666 10 66.6 @s8n You tried very hard to portray this good ...
246 182 10 18.2 @markhoppus 182/10
556 75 10 7.5 This is Logan, the Chow who lived. He solemnly...
802 1776 10 177.6 This is Atticus. He's quite simply America af....
1892 420 10 42.0 After so many requests... here you go.\r\n\r\n...

Even with WeRateDog's rating systems where the numerator can be higher than the denominator, we can still normalise the scores by calculating the score = rating numerator / rating denominator.

We see some outliers of calculated rating scores (420/10, 666/10, 182/10, 1776/10, 75/10) which are likely memes/references (420, 666, 1776). Checking some of these images, one is not even a dog.

99% percentile: [0.2 1.4]

The 99% percentile of calculated rating scores lie between [0.2, 1.4], and the so we remove outliers (score > 3) which were way outside this range

The outlier rating scores (7.5, 18.2, 42.0, 42.0, 66.6, 177.60) were removed on the left plot. The remaining scores are clustered around 1.0 (10/10), though there are many scores > 1.0 due to WeRateDog's rating system.

The right plot is a log scale histogram that shows the removed outliers were outside the main cluster around ${10^0}$ (1.0). The outliers are not visible since they are single counts compared to the 50+ counts for the more common scores.

From the side-by-side histograms, we see that there are more lower scores (< 1.0) for tweets with pictures predicted to be non-dogs (right plot), compared to tweets with pictures predicted to be dogs (left plot).

From the boxplot, we can see that when the image is predicted not to be a dog, the mean is at 1.0 and the 25-75% percentiles are lower than when the image is predicted to be a dog, which have mean > 1.0

From these side-by-side scatterplot comparisons, there appears to be a weak-moderate positive correlation between Rating Score, and Favorite Count/Retweet Count. There is a strong positive correlation between Favorite Count and Retweet Count.

Plotting both retweet count and favorite count over time, we see that retweet counts very slightly increase, wherease favorite counts increase more moderately over time.

Plotting the rating scores over time, we see that most of the lower rating scores tended to occur in past tweets (before 2016-10)

From the side-by-side box plots of the 4 dog stages (Doggo, Floofer, Pupper, Puppo), we see that Doggos and Floofers tend to have higher rating scores than Non-Doggos and Non-Floofers respectively. Puppers have about the same scores as Non-Puppers. Puppos have much higher scores than Non-Puppos.

Out[13]:
predict
clumber                 2.700000
Bouvier_des_Flandres    1.300000
Saluki                  1.250000
briard                  1.233333
Tibetan_mastiff         1.225000
Name: rating_score, dtype: float64
Out[14]:
predict
Tibetan_terrier                0.925
Walker_hound                   0.900
Scotch_terrier                 0.900
soft-coated_wheaten_terrier    0.880
Japanese_spaniel               0.500
Name: rating_score, dtype: float64

Breeds with best rating scores are: clumber, Bouvier_des_Flandres, Saluki, Pomeranian, briard. Worst rated dog types are: Japanese_spaniel, soft-coated_wheaten_terrier, Scotch_terrier, Walker_hound, Tibetan_terrier

Out[15]:
predict
pole         1.4
limousine    1.4
pedestal     1.4
prison       1.4
dough        1.3
Name: rating_score, dtype: float64
Out[16]:
predict
slug             0.2
lacewing         0.1
electric_fan     0.1
paper_towel      0.1
traffic_light    0.0
Name: rating_score, dtype: float64

Let's check some pictures with the highest ratings (> 1.3):

Now some pictures with the lowerst ratings (< 0.2):

5 best scoring pictures that were not predicted to be dogs:

For some reason, some non-dog categories such as 'pole' and 'dough' have high rating scores. Checking these images, the actual dogs are small in the pictures, or the picture is of something that looks like a dog, which is why they were misclassified.

Pictures with the most retweet counts:

Pictures with the most favourite counts:

From the repetitions, it makes sense that pictures with the highest retweet counts would also have the highest favorite counts.